Beyond Tag Trigrams: New Local Features for Tagging
نویسندگان
چکیده
finch,[email protected] [email protected] Abstract The set of features used by any predictive model is of pivotal importance to its performance. In this paper we show the utility and quantify the effect of adding features consisting of arrangements of words and tags (selected by an expert grammarian) in the local context of a trigram tagger. We look in detail at the effect, on tagging with a large syntactic and semantic tagset, of adding these features. We show that the addition of a set of such features improves the the error rate of a trigram tagger by approximately 11%.
منابع مشابه
Beyond N in N-gram Tagging
The Hidden Markov Model (HMM) for part-of-speech (POS) tagging is typically based on tag trigrams. As such it models local context but not global context, leaving long-distance syntactic relations unrepresented. Using n-gram models for n > 3 in order to incorporate global context is problematic as the tag sequences corresponding to higher order models will become increasingly rare in training d...
متن کاملA Measure Of Aggregate Syntactic Distance
We compare vectors containing counts of trigrams of part-of-speech (POS) tags in order to obtain an aggregate measure of syntax difference. Since lexical syntactic categories reflect more abstract syntax as well, we argue that this procedure reflects more than just the basic syntactic categories. We tag the material automatically and analyze the frequency vectors for POS trigrams using a permut...
متن کاملImproved Arabic Base Phrase Chunking with a new enriched POS tag set
Base Phrase Chunking (BPC) or shallow syntactic parsing is proving to be a task of interest to many natural language processing applications. In this paper, A BPC system is introduced that improves over state of the art performance in BPC using a new part of speech tag (POS) set. The new POS tag set, ERTS, reflects some of the morphological features specific to Modern Standard Arabic. ERTS expl...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملApplying Extrasentential Context To Maximum Entropy Based Tagging With A Large Semantic And Syntactic Tagset
Experiments are presented which measure the perplexity reduction derived from incorporating into the predictive model utilised in a standard tag-n-gram part-of-speech tagger, contextual information from previous sentences of a document. The tagset employed is the roughly-3000-tag ATR General English Tagset, whose tags are both syntactic and semantic in nature. The kind of extrasentential inform...
متن کامل